Improving decision trees for acoustic modeling
نویسندگان
چکیده
In the last few years, the power and simplicity of classification trees as acoustic modeling tools have gained them much popularity. In [1], we studied ‘‘tree units’’, which cluster parameters at the HMM level. Building on this earlier work, we examine some new variants of Young et al’s ‘‘tree states’’, which cluster parameters at the state level [2]. We have experimented with: 1. Making unitary models (which contain additional information about the context) 2. Pruning trees with various severity levels (idea introduced in [1]) 3. Pooling some leaves (idea adapted from [2]) 4. Refining the questions 5. Questions about the position of the phone within the word 6. Lookahead search 7. Making a single tree for each phone 1. MAKING TREE STATES A tree state is a particular state of an HMM for a given phone (e.g., state 1 of ‘a’) for which different phonetic contexts have been clustered by means of a decision tree [2]. In our earlier work on tree units, we employed the Gelfand-Ravishankar-Delp iterative growing and pruning algorithm [3] to build the trees. Fortunately, this efficient and easily implemented algorithm carries over almost unchanged to tree states. The algorithm calls for two sets of labelled training data, A and B. One first grows an overtrained tree on set A, prunes it by evaluating how well it predicts set B, then grows it again using set B, prunes it using set A, and so on until two successive pruned trees are identical. In practice, we have observed that two expansion-pruning cycles are usually sufficient. 1.1. Growing Tree States Recall that to grow a decision tree on a set of labelled training data items, one must supply three elements: • a set of possible yes-no questions; • a rule for selecting the best question at a node; • a method for pruning trees to prevent over-training. The set of possible yes-no questions is application-dependent. In the case of acoustic models, the questions are usually about the identity of the phones surrounding the central phone, and can be grouped in classes (see section 1.4), but they can also be about anything judged relevant by the researchers. The rule for selecting the best question at a node is based on the idea that the ‘‘yes’’ and ‘‘no’’ models generated when the training data in a node is split by the question should predict this training data as well as possible. If L(P) is the log likelihood that the parent node generated the training data, and L(Y) and L(N) are the corresponding ‘‘yes’’ and ‘‘no’’ log likelihoods, the question chosen is the question which maximizes
منابع مشابه
Pronunciation modeling by sharing Gaussians
Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decision trees to generate alternate word pronunciations from phonemic baseforms. Use of pronunciation models during recognition is known to i...
متن کاملComparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...
متن کاملHigh accuracy acoustic modeling based on multi-stage decision tree
In many continuous speech recognition systems based on HMMs, decision tree-based state tying has been used for not only improving the robustness and accuracy of context dependent acoustic modeling but also synthesizing unseen models. To construct the phonetic decision tree, standard method has used just single Gaussian triphone models to cluster states. The coarse clusters generated using just ...
متن کاملRecognizing Sloppy Speech
As speech recognition moves from labs into the real world, the sloppy speech problem emerges as a major challenge. Sloppy speech, or conversational speech, refers to the speaking style people typically use in daily conversations. The recognition error rate for sloppy speech has been found to double that of read speech in many circumstances. Previous work on sloppy speech has focused on modeling...
متن کاملPredicting The Type of Malaria Using Classification and Regression Decision Trees
Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...
متن کاملPronunciation modeling by sharing gaussian densities across phonetic models
Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decision-trees to generate alternate word pronunciations from phonemic baseforms. Use of such pronunciation models during recognition is known...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996